A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain
نویسندگان
چکیده
Sino-Korean words, which are historically borrowed from Chinese language, could be represented with both Hanja (Chinese characters) and Hangeul (Korean characters) writings. Previous Korean Input Method Editors (IMEs) provide only a simple dictionary-based approach for Hangeul-Hanja conversion. This paper presents a sentencebased statistical model for Hangeul-Hanja conversion, with word tokenization included as a hidden process. As a result, we reach 91.4% of character accuracy and 81.4% of word accuracy in terminology domain, when only very limited Hanja data is available.
منابع مشابه
Missionary contributions toward the revaluation of Hangeul in late nineteenth-century Korea
Soon after their arrival to Korea, Christian missionaries were confronted by decisions regarding how they would present written materials to the Korean people. While many Koreans used their indigenous script (Hangeul) for everyday purposes, higher status literacy materials were expected to be presented using Chinese characters (Hanja), a system unfamiliar to most but considered more prestigious...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملSampling Rate Conversion in the Discrete Linear Canonical Transform Domain
Sampling rate conversion (SRC) is one of important issues in modern sampling theory. It can be realized by up-sampling, filtering, and down-sampling operations, which need large complexity. Although some efficient algorithms have been presented to do the sampling rate conversion, they all need to compute the N-point original signal to obtain the up-sampling or the down-sampling signal in the tim...
متن کاملUnicode Canonical Decomposition for Hangeul Syllables in Regular Expression
Owing to the high expressiveness of regular expression, it is frequently used in searching and manipulation of text based data. Regular expression is highly applicable in processing Latin alphabet based text, but the same cannot be said for Hangeul∗, the writing system for Korean language. Although Hangeul possesses alphabetic features within the script, expressiveness of regular expression pat...
متن کاملDissociative Disturbance in Hangul-Hanja Reading after a Left Posterior Occipital Lesion
Since the Korean language has two distinct writing systems, phonogram (Hangul) and ideogram (Hanja: Chinese characters), alexia can present with dissociative disturbances in reading between the two systems. A 74-year-old right-handed man presented with a prominent reading impairment in Hangul with agraphia of both Hangul and Hanja after a left posterior occipital- parietal lesion. He could not ...
متن کامل